14 research outputs found

    Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task

    Full text link
    Benefiting from large-scale datasets and pre-trained models, the field of generative models has recently gained significant momentum. However, most datasets for symbolic music are very small, which potentially limits the performance of data-driven multimodal models. An intuitive solution to this problem is to leverage pre-trained models from other modalities (e.g., natural language) to improve the performance of symbolic music-related multimodal tasks. In this paper, we carry out the first study of generating complete and semantically consistent symbolic music scores from text descriptions, and explore the efficacy of using publicly available checkpoints (i.e., BERT, GPT-2, and BART) for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity. We analyse the capabilities and limitations of our model to better understand the potential of language-music models.Comment: 5 pages, 2 figures, 2 table

    Gamma Sampling: Fine-grained Controlling Language Models without Training

    Full text link
    The dominant approaches for controlling language models achieve prominence in controlling high-level attributes (e.g. topic and sentiment). However, these methods often require condition-specific data or are computationally expensive. We propose a new simple guided decoding method, Gamma Sampling, which does not require any training data to achieve fine-grained controllable text generation while maintaining a fast generation speed. Gamma Sampling introduces attribute-related information (provided by humans or language models themselves) into the sampling process to guide language models to generate texts with desired attributes. Since no training is involved, Gamma Sampling can be easily applied to any language model for controllable text generation. Through experiments, we show that Gamma Sampling-steered GPT2-small (117M) outperforms baselines such as PPLM (345M) and CTRL (1.6B) in diversity, attribute relevance, and overall quality of generated samples.Comment: 20 pages, 5 figure

    Chord-Conditioned Melody Choralization with Controllable Harmonicity and Polyphonicity

    Full text link
    Melody choralization, i.e. generating a four-part chorale based on a user-given melody, has long been closely associated with J.S. Bach chorales. Previous neural network-based systems rarely focus on chorale generation conditioned on a chord progression, and none of them realised controllable melody choralization. To enable neural networks to learn the general principles of counterpoint from Bach's chorales, we first design a music representation that encoded chord symbols for chord conditioning. We then propose DeepChoir, a melody choralization system, which can generate a four-part chorale for a given melody conditioned on a chord progression. Furthermore, with the improved density sampling, a user can control the extent of harmonicity and polyphonicity for the chorale generated by DeepChoir. Experimental results reveal the effectiveness of our data representation and the controllability of DeepChoir over harmonicity and polyphonicity. The code and generated samples (chorales, folk songs and a symphony) of DeepChoir, and the dataset we use now are available at https://github.com/sander-wood/deepchoir.Comment: 7 pages, 4 figures, 2 table

    TunesFormer: Forming Irish Tunes with Control Codes by Bar Patching

    Full text link
    This paper introduces TunesFormer, an efficient Transformer-based dual-decoder model specifically designed for the generation of melodies that adhere to user-defined musical forms. Trained on 214,122 Irish tunes, TunesFormer utilizes techniques including bar patching and control codes. Bar patching reduces sequence length and generation time, while control codes guide TunesFormer in producing melodies that conform to desired musical forms. Our evaluation demonstrates TunesFormer's superior efficiency, being 3.22 times faster than GPT-2 and 1.79 times faster than a model with linear complexity of equal scale while offering comparable performance in controllability and other metrics. TunesFormer provides a novel tool for musicians, composers, and music enthusiasts alike to explore the vast landscape of Irish music. Our model and code are available at https://github.com/sander-wood/tunesformer.Comment: 5 pages, 3 figures, 1 tabl

    WikiMT++ Dataset Card

    Full text link
    WikiMT++ is an expanded and refined version of WikiMusicText (WikiMT), featuring 1010 curated lead sheets in ABC notation. To expand application scenarios of WikiMT, we add both objective (album, lyrics, video) and subjective emotion (12 emotion adjectives) and emo\_4q (Russell 4Q) attributes, enhancing its usability for music information retrieval, conditional music generation, automatic composition, and emotion classification, etc. Additionally, CLaMP is implemented to correct the attributes inherited from WikiMT to reduce errors introduced during original data collection and enhance the accuracy and completeness of our dataset

    CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval

    Full text link
    We introduce CLaMP: Contrastive Language-Music Pre-training, which learns cross-modal representations between natural language and symbolic music using a music encoder and a text encoder trained jointly with a contrastive loss. To pre-train CLaMP, we collected a large dataset of 1.4 million music-text pairs. It employed text dropout as a data augmentation technique and bar patching to efficiently represent music data which reduces sequence length to less than 10\%. In addition, we developed a masked music model pre-training objective to enhance the music encoder's comprehension of musical context and structure. CLaMP integrates textual information to enable semantic search and zero-shot classification for symbolic music, surpassing the capabilities of previous models. To support the evaluation of semantic search and music classification, we publicly release WikiMusicText (WikiMT), a dataset of 1010 lead sheets in ABC notation, each accompanied by a title, artist, genre, and description. In comparison to state-of-the-art models that require fine-tuning, zero-shot CLaMP demonstrated comparable or superior performance on score-oriented datasets. Our models and code are available at https://github.com/microsoft/muzic/tree/main/clamp.Comment: 11 pages, 5 figures, 5 tables, accepted by ISMIR 202

    Visible-Light-Active Titanium Sulfonate Framework for Photocatalytic Organic Synthesis

    No full text
    In this work, the first visible-light-active titanium sulfonate metal–organic framework (denoted as FIR-138) with 2-fold interpenetrated srs topology was synthesized by employing 2,5-dihydroxy-1,4-benzenedisulfonic acid (H4DOBSC) as ligands. The strong chelating coordination ability of the hydroxyl and sulfonate O atoms from H4DOBSC endows the framework of FIR-138 with good stability, while the formation of the Ti-phenolic motif ensures excellent visible light absorption with a bandgap (Eg) of 1.74 eV. More importantly, the extensive titanium active sites within the structure could trap the photogenerated electrons and promote the charge separation effectively, attributed to the excellent visible light photocatalytic performance in organic reaction. FIR-138’s capability to harness visible light for photocatalytic reactions presents a promising advancement in the field of Ti-MOF photocatalysts. These results provide valuable insights and open up new avenues for the rational design and synthesis of visible-light-active Ti-MOF photocatalysts

    Visible-Light-Active Titanium Sulfonate Framework for Photocatalytic Organic Synthesis

    No full text
    In this work, the first visible-light-active titanium sulfonate metal–organic framework (denoted as FIR-138) with 2-fold interpenetrated srs topology was synthesized by employing 2,5-dihydroxy-1,4-benzenedisulfonic acid (H4DOBSC) as ligands. The strong chelating coordination ability of the hydroxyl and sulfonate O atoms from H4DOBSC endows the framework of FIR-138 with good stability, while the formation of the Ti-phenolic motif ensures excellent visible light absorption with a bandgap (Eg) of 1.74 eV. More importantly, the extensive titanium active sites within the structure could trap the photogenerated electrons and promote the charge separation effectively, attributed to the excellent visible light photocatalytic performance in organic reaction. FIR-138’s capability to harness visible light for photocatalytic reactions presents a promising advancement in the field of Ti-MOF photocatalysts. These results provide valuable insights and open up new avenues for the rational design and synthesis of visible-light-active Ti-MOF photocatalysts
    corecore